docs(napkin-math): proposal 141 — source-preservation audit by neoneye · Pull Request #739 · PlanExeOrg/PlanExe

neoneye · 2026-05-20T23:46:43Z

Summary

Adds proposal 141 to docs/proposals/ covering a deterministic source-preservation audit for the napkin_math pipeline. Doc-only PR — no code or prompt changes.

What the proposal specifies

Two orthogonal audit forks sharing one schema:
- Fork A — source/digest threshold-like claims vs. current parameters.json
- Fork B — prior-baseline signals vs. current parameters.json
New optional schema fields on extract artifacts: source_claim_ids (per entry) and dropped_signals (top-level array)
Deterministic source_claim_id hashes for mechanical claim-to-declaration matching
Closed enum for reason (replaced_by, cap_pressure, out_of_scope, moved_to_unmodelled_gate, redundant_with) with dedicated reference fields per reason
Validation rules on dropped_signals itself (e.g. cap_pressure must actually match a capped array at cap)
A new advisory experiments/napkin_math/audit_source_preservation.py script (no LLM call)
Phased rollout: schema docs → advisory script → corpus probe → orchestrator integration → strict policy

Why now

Two failure modes the existing depends-on audit cannot catch:

Source-stated threshold absent from output (extractor recognises but omits)
Prior-baseline signal absent from current version (silent regression vs. structural improvement looks identical)

Both predate this proposal and were surfaced during the PR #737 v50 prompt-cleanup work. The proposal commits to a design before implementation lands.

What this PR does NOT do

Does not implement the schema field or the audit script — that is the Phase 1-4 follow-up work in the proposal itself.
Does not modify any prompt or code file.
Does not introduce corpus literals anywhere.

Commit chain

aaceee55 — Initial proposal draft
3c47a3ab — ChatGPT-led restructuring (added Pitch, Feasibility, Implementation Phases, Success Metrics, Risks, Acceptance, Open Questions; deterministic source_claim_id hash; closed-enum reason with dedicated reference fields; validation rules on dropped_signals)

Test plan

Proposal follows docs/proposals/AGENTS.md formatting
No corpus literals (grep -nE "€[0-9]|km²|GW|GVA|RTE|DGSI|DREAL|OPC UA|paperclip|hyperscale|yellowstone|mars_gtld|euro_adoption|crate_recovery|datacenter" docs/proposals/141-source-preservation-audit.md returns nothing)
Phase 1-4 implementation tracked in a follow-up PR

…ipeline Two failure modes the existing depends-on audit cannot catch: 1. Source-stated thresholds (floors, caps, targets, deadlines) silently absent from parameters.json. The plan names the gate; the extractor recognises it but omits it; downstream stages cannot test it. 2. Prior-baseline variables silently dropped between vN-1 and vN. May be a structural improvement (replaced by a better-named equivalent) or a silent regression. Same depends-on audit passes either way. Proposal specifies: - A new optional 'dropped_signals' field on the extract artifact's JSON shape, with a closed enumeration of allowed reasons (replaced_by, cap_pressure, out_of_scope, unmodelled_external, redundant_with). - A source-preservation rule added to the extract skill system prompts, requiring either preservation or explicit drop justification for every source-stated or prior-baseline signal. - A deterministic Python audit script (no LLM call) that scans the digest for threshold patterns, diffs the prior baseline id-set against the current, and flags unjustified drops. - Scope deliberately limited to the extract stage. Compress preservation is a different problem (the LLM is meant to drop content there). Downstream deterministic stages preserve by construction. Corpus-agnostic by design: enumeration members are structural categories, regex patterns target comparison-word structures, and the rule applies to any plan in any domain. Plan-name references in the doc text itself are humans-only context, not prompt content.

Replaces the prior 'Status as of 2026-05-21' content with three explicit subsections: 1. Landed on main: PR #737 (Phase 1 compress + initial extract threshold-pairing + OPTIMIZE_INSTRUCTIONS) and PR #739 (Proposal 141 design only). 2. Open for merge: PR #740 commit chain (4cda70b source-arithmetic + parity, 19f927b aggregate-sum tightening, 8f94c8c source_text truncation discipline). All edits applied symmetrically to both extract skills. 3. PR #740 verification posture: same-LLM same-session regression check, not improvement proof. All six v51 parameters.json files validate clean. Behavioural verification of the rules on a different LLM is a separate piece of follow-up work, not part of PR #740. Known limitations section now explicitly names the clearest unresolved regression (paperclip OPC UA / p99 latency at compress stage), the cap-pressure-without-recorded-rationale gap (yellowstone public_compliance trio), and the absence of a source-preservation audit implementation (proposal 141 design merged, code not). Lists four follow-up PRs in preferred order; bundling them re-creates the scope creep PR #740 was extracted from.

neoneye added 2 commits May 21, 2026 01:35

chatgpt tweaks

3c47a3a

neoneye merged commit 9531f6b into main May 20, 2026
3 checks passed

neoneye deleted the napkin-math/source-preservation-audit branch May 20, 2026 23:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs(napkin-math): proposal 141 — source-preservation audit#739

docs(napkin-math): proposal 141 — source-preservation audit#739
neoneye merged 2 commits into
mainfrom
napkin-math/source-preservation-audit

neoneye commented May 20, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

neoneye commented May 20, 2026

Summary

What the proposal specifies

Why now

What this PR does NOT do

Commit chain

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant